This jupyter notebook will guide you how to use SIF Level 2 data (ungridded SIF retrievals for each individual satellite footprint) in your research. I will also go through some of the typical pitfalls and misconceptions in using the data. The examples will be based mostly on TROPOMI and OCO-2 data but can easily be adapted to also access GOME-2 and SCIAMACHY data. In fact, some of the provided readers will be easily adaptable to ingest any arbitrary Level 2 dataset in HDF5 (or netCDF4).
First of all, almost all scientific datasets are now provided in either HDF or netCDF formats. We currently use the latest standards, i.e. HDF5 and netCDF4 (which is basically a layer on top of HDF5, so any HDF5 reader can access netCDF datasets). Typically, datasets include comments and units as attributes, so that you know what you are dealing with even without a user guide. In python, there are very convenient tools to access these files, h5py (http://docs.h5py.org/en/stable/) and netCDF4 (http://unidata.github.io/netcdf4-python/). A good start to look at files in the beginning is to use viewing tools like:
Setting up the Jupyter environment (typical imports)
In [1]:
#%matplotlib inline
## The usual packages (numpy, matplotlib, etc)
import matplotlib.pyplot as plt
import numpy as np
# nicer figures using ggg plot style.
plt.style.use('ggplot')
from IPython.core.display import HTML
from IPython.display import clear_output
from IPython.core.pylabtools import figsize
In [2]:
# This file L2_tools.py is in the same folder here and will provide easy access to TROPOMI and OCO-2 L2 data
from L2_tools import *
paths for OCO-2 and TROPOMI data (in fact, the path just points to our internal structure of the ftp server at ftp://fluo.gps.caltech.edu/data/). So once you copy the folders (I suggest you learn how to use wget https://www.gnu.org/software/wget/ or curl https://www.tecmint.com/linux-curl-command-examples/)
In [3]:
# Please adapt to your local directories after having mirrored our ftp data (and extracted the ungridded TROPOMI files):
path_tropomi = '/Volumes/data2/projects/TROPOMI/nc_ungridded/'
path_oco2 = '/Volumes/data1/ftp/data/OCO2/sif_lite_B8100/'
# This is still cumbersome, the units are part of the netCDF4 files but I hardcode them for now:
t_unit_oco2 = "seconds since 1993-1-1 0:0:0"
t_unit_tropomi = "seconds since 1970-01-01 00:00:00"
t_cal = u"gregorian"
In [4]:
# Let us read in some OCO-2 and TROPOMI data over a non-fluorescing region in the Sahara
# Define region boundaries for a part of the Sahara:
latMin = 20
latMax = 23
lonMin = 5
lonMax = 8
Now read in the data, two dictionaries for OCO-2 and TROPOMI were included in L2_tools.py, it basically just enables you to decides which fields to read in and can be adapted (actually to read in any kind of data) The first entry should provide a path to all files you want to read (you can use wildcard!)
In [5]:
oco2_sahara = L2(path_oco2+'2018/0[45678]/*.nc4', dict_oco2,latMin, latMax, lonMin, lonMax )
In [6]:
# TROPOMI will take a bit more time, be patient!
tropomi_sahara = L2(path_tropomi+'*2018-0[45678]*.nc', dict_tropomi,latMin, latMax, lonMin, lonMax )
The syntax for reading in data is as follows: data = L2(path_to_data, dictionary,latMin, latMax, lonMin, lonMax ) Find some examples for the dictionary below (these can be adapted to read in GOME-2 data). The format describes the "key" as how you want to name the variable in your returned class and the "item" as the variable name of the HDF5 file.
How does the reading routine work? It basically goes through the HDF5 files and uses a dictionary to decide which variables to read in. Then, it just concatenates all the data into a larger set and stores everything in a class (just think of it like a structure for now)
Take a sneak peak at the dictionaries used, they basically describe which variable to read in (2nd componend) and how to call them in the class structure (first component). These are very generic, so you use them to read in OCO-2 XCO2 lite files as well (just require a new dictionary). In the future, we may add the use of json files as dictionary input.
In [7]:
dict_oco2
Out[7]:
In [8]:
dict_tropomi
Out[8]:
In [9]:
# First convert oco2 time units to a python time (this is a clunky step right now, need to automate later!)
oco2_sahara.time_python = convert_time(oco2_sahara.time, t_unit_oco2, t_cal)
tropomi_sahara.time_python = convert_time(tropomi_sahara.time, t_unit_tropomi, t_cal)
In [10]:
# now lets plot a timeseries
figsize(12,6)
plt.subplot(1,2,1)
plt.plot(oco2_sahara.time_python, oco2_sahara.sif_757,'.')
plt.ylabel('SIF (W/m$^2$/sr/$\mu$m)')
plt.title('OCO2 (SIF at 758nm)')
plt.subplot(1,2,2)
plt.plot(tropomi_sahara.time_python, tropomi_sahara.sif,'.')
plt.ylabel('SIF (W/m$^2$/sr/$\mu$m)')
plt.title('TROPOMI (SIF at 740nm)')
Out[10]:
In [11]:
# let us look at a histogram
plt.hist(oco2_sahara.sif_757, 50,range=(-3,3), density=True, alpha=0.5, label='OCO-2')
plt.hist(tropomi_sahara.sif, 50,range=(-3,3), density=True, alpha=0.5, label='TROPOMI')
plt.legend(loc=0)
Out[11]:
In [12]:
print('Standard deviation of measured OCO-2 SIF data in Sahara ' + str(np.std(oco2_sahara.sif_757)) )
print('Expected standard deviation based on posteriori error ' + str(np.mean(oco2_sahara.sif_757_sigma)) )
In [13]:
print('Mean if we exclude negative data ' + str(np.mean(oco2_sahara.sif_757[oco2_sahara.sif_757>=0])))
print('True mean ' + str(np.mean(oco2_sahara.sif_757)))
You can see 2 things here: a) removing negative values is dangerous. If the true value is 0 and we have measurement noise, negative retrievals are not unphysical if they are within the posterior noise estimate; b) we sometimes have averages that are below (e.g. -0.12 here). This can happen and really depends on the posterior bias correction. We are still working on this but removing small biases on the order of 0.1 is really hard.
In [14]:
# Define region boundaries (roughly Iowa)
latMin_cb = 40
latMax_cb = 43
lonMin_cb = -96
lonMax_cb = -91
In [15]:
oco2_iowa = L2(path_oco2+'2018/0[45678]/*.nc4', dict_oco2,latMin_cb, latMax_cb, lonMin_cb, lonMax_cb )
In [16]:
tropomi_iowa = L2(path_tropomi+'*2018-0[45678]*.nc', dict_tropomi,latMin_cb, latMax_cb, lonMin_cb, lonMax_cb )
In [17]:
# Again, convert tedious times in the files to something you can work with in python
oco2_iowa.time_python = convert_time(oco2_iowa.time, t_unit_oco2, t_cal)
tropomi_iowa.time_python = convert_time(tropomi_iowa.time, t_unit_tropomi, t_cal)
In [18]:
# now lets plot a timeseries
figsize(14,6)
plt.subplot(1,2,1)
nadir = np.where(oco2_iowa.mode==0)[0]
glint = np.where(oco2_iowa.mode==1)[0]
plt.plot(oco2_iowa.time_python[nadir], oco2_iowa.sif_757[nadir],'.',markersize=0.5, label='Nadir')
plt.plot(oco2_iowa.time_python[glint], oco2_iowa.sif_757[glint],'.',markersize=0.5, label='Glint')
plt.ylabel('SIF (W/m$^2$/sr/$\mu$m)')
plt.title('OCO2 (SIF at 758nm)')
plt.legend(loc=0)
plt.subplot(1,2,2)
plt.plot(tropomi_iowa.time_python, tropomi_iowa.sif,'.',markersize=0.5)
plt.ylabel('SIF (W/m$^2$/sr/$\mu$m)')
plt.title('TROPOMI (SIF at 740nm)')
Out[18]:
In [19]:
plt.scatter(tropomi_iowa.time_python, tropomi_iowa.sif,s=0.5, c=np.abs(tropomi_iowa.phaseAngle))
plt.colorbar()
plt.title('TROPOMI time-series with PhaseAngle color-coded')
Out[19]:
well, we have "somewhat" more data points using TROPOMI:
In [20]:
print('Number of OCO-2 data points in Iowa: ' + str(len(oco2_iowa.sif_757)))
print('Number of TROPOMI data points in Iowa: ' + str(len(tropomi_iowa.sif)))
You can see that individual data are too noisy to work with, thus we have to average. Looking at Iowa, we see that we have more than enough data for averaging. Now we can define a temporal resolution and create a running mean to drive down the noise (which should scale with $1/\sqrt{n}$):
In [21]:
# Let us define some averaging interval, get data every 3 days and create a running mean of +/- 3 days:
from datetime import datetime
# Define data range to smooth on (every 3 days)
dates = np.arange('2018-04', '2018-08', dtype='datetime64[3D]').astype(datetime)
In [22]:
# We used dates with a 3 day time-step and now use a +/- 3 day filter for the running mean:
iowa_timeseries_oco2 = sif_rMean(oco2_iowa.time_python, oco2_iowa.sif_757,dates, 3 )
iowa_timeseries_tropomi = sif_rMean(tropomi_iowa.time_python, tropomi_iowa.sif,dates, 3 )
In [23]:
# Conversion factor from 758nm to 740nm is roughly 1.55 (we can provide shapes if needed)
cf = 1.55
plt.errorbar(dates, iowa_timeseries_oco2.mean,yerr=iowa_timeseries_oco2.standard_error, label='OCO-2 Mean')
plt.errorbar(dates, iowa_timeseries_tropomi.mean,yerr=iowa_timeseries_tropomi.standard_error, label='TROPOMI Mean')
plt.ylabel('SIF (W/m$^2$/sr/$\mu$m)')
plt.legend(loc=0)
plt.title('Iowa Timeseries, +/-3 day running mean')
Out[23]:
We see that the time-series agree reasonably well but that there is an offset (or scaling factor) as we define SIF at different wavelength. Let us take a look at typical SIF shapes from the SCOPE model (https://github.com/Christiaanvandertol/SCOPE) or our leaf level measurements (https://nph.onlinelibrary.wiley.com/doi/full/10.1111/nph.14662).
In [24]:
plt.plot(shape_scope[:,0], shape_scope[:,1], label='SCOPE')
plt.plot(shape_walz[:,0], shape_walz[:,1]*12, label='Leaf level measurements')
plt.title('normalized SIF shape')
plt.xlabel('Wavelength (nm)')
plt.legend()
Out[24]:
We can see that SIF changes quite a bit with wavelength but that the overall shape $>$740nm us pretty consistent between our leaf level measurements and the SCOPE model. The shorter wavelength is more affected by chlorophyll re-absorption, which can dramatically change the shape (also, we used red light as illumination for the leaf level measurements, which causes higher re-absorption as it penetrates deeper into the leaf).
We provide a simple routine to convert the scaling factor needed to convert data from wavelength X to Y: Let's say we want to convert OCO-2 data from 758nm to the TROPOMI grid of 740nm, we get the following scaling factor:
In [25]:
print('Factor from SCOPE ', convertWL(758,740,shape_scope))
print('Factor from leaf level ',convertWL(758,740,shape_walz))
fac = convertWL(758,740,shape_walz)
Now we can use the factor to better match the 2 time-series:
In [26]:
# Conversion factor from 758nm to 740nm is roughly 1.55 (we can provide shapes if needed)
plt.errorbar(dates, iowa_timeseries_oco2.mean*fac,yerr=iowa_timeseries_oco2.standard_error*fac, label='OCO-2 Mean')
plt.errorbar(dates, iowa_timeseries_tropomi.mean,yerr=iowa_timeseries_tropomi.standard_error, label='TROPOMI Mean')
plt.ylabel('SIF @740nm (W/m$^2$/sr/$\mu$m)')
plt.legend(loc=0)
plt.title('Iowa Timeseries, +/-3 day running mean')
Out[26]:
Voila, this looks much better
Another crucial impact is due to changes in the overpass time and how to relate the instantaenous SIF to a daily average. Our first simple approximations are just based on calculations of the ratio of a daily average PAR vs instantaenous PAR (assuming cloud free conditions). To first order, this should bring different instruments into better alignment. We provide this conversion factor for all instruments. For TROPOMI and OCO-2, it is not a huge factor as the overpass time is similar (note though the the factor changes between the eastern and western part of the TROPOMI swath as the local time changes quite a bit):
In [27]:
# Let's calculate the SIF timeseries including the length of day correction:
# Do the same for the length of day correction
iowa_timeseries_tropomiDC = sif_rMean(tropomi_iowa.time_python, tropomi_iowa.sif*tropomi_iowa.dcCorr,dates, 3 )
iowa_timeseries_oco2DC = sif_rMean(oco2_iowa.time_python, oco2_iowa.sif_757*oco2_iowa.dcCorr,dates, 3 )
In [28]:
plt.plot(dates, iowa_timeseries_oco2DC.mean*fac, label='OCO-2 LOD corrected')
plt.plot(dates, iowa_timeseries_tropomiDC.mean, label='TROPOMI LOD corrected')
plt.legend()
plt.ylabel('$\overline{SIF}$ @740nm (W/m$^2$/sr/$\mu$m)')
plt.title('Length of Day correction impact')
Out[28]:
no big change here as the overpass times are similar (but it will have an impact on GOME-2 comparisons!)
This is it, hope it was useful, Feedback and improvements are welcome! cfranken@caltech.edu
In [ ]: